Fine-tuning Transformer Models using Transfer Learning for Multilingual Threatening Text Identification

نویسندگان

چکیده

Threatening content detection on social media has recently gained attention. There is very limited work regarding threatening in low-resource languages, especially Urdu. Furthermore, previous explored only mono-lingual approaches, and multi-lingual was not studied. This research addressed the task of Multi-lingual Content Detection (MTCD) Urdu English languages by exploiting transfer learning methodology with fine-tuning techniques. To address task, we investigated two methodologies: 1) Joint multi-lingual, 2) Joint-translated method. The former approach employs concept building a universal classifier for different whereas latter applies translation process to transform text into one language then perform classification. We explore Multilingual Representations Indian Languages (MuRIL) Robustly Optimized BERT Pre-Training Approach (RoBERTa) that already demonstrated state-of-the-art capturing contextual semantic characteristics within text. For hyper-parameters, manual search grid strategies are utilized find optimum values. Various experiments performed bi-lingual datasets findings revealed proposed outperformed baselines showed benchmark performance. RoBERTa model achieved highest performance demonstrating 92% accuracy 90% macro f1-score joint approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimize transfer learning for lung diseases in bronchoscopy using a new concept: sequential fine-tuning

Bronchoscopy inspection as a follow-up procedure from the radiological imaging plays a key role in lung disease diagnosis and determining treatment plans for the patients. Doctors needs to make a decision whether to biopsy the patients timely when performing bronchoscopy. However, the doctors also needs to be very selective with biopsies as biopsies may cause uncontrollable bleeding of the lung...

متن کامل

Multilingual Topic Models for Unaligned Text

We develop the multilingual topic model for unaligned text (MuTo), a probabilistic model of text that is designed to analyze corpora composed of documents in two languages. From these documents, MuTo uses stochastic EM to simultaneously discover both a matching between the languages and multilingual latent topics. We demonstrate that MuTo is able to find shared topics on real-world multilingual...

متن کامل

Learning Distributed Representations for Multilingual Text Sequences

We propose a novel approach to learning distributed representations of variable-length text sequences in multiple languages simultaneously. Unlike previous work which often derive representations of multi-word sequences as weighted sums of individual word vectors, our model learns distributed representations for phrases and sentences as a whole. Our work is similar in spirit to the recent parag...

متن کامل

Improve the performance of transfer learning without fine-tuning using dissimilarity-based multi-view learning for breast cancer histology images

Breast cancer is one of the most common types of cancer and leading cancer-related death causes for women. In the context of ICIAR 2018 Grand Challenge on Breast Cancer Histology Images, we compare one handcrafted feature extractor and five transfer learning feature extractors based on deep learning. We find out that the deep learning networks pretrained on ImageNet have better performance than...

متن کامل

Fine Tuning in Supersymmetric Models

The solution to fine tuning is one of the principal motivations for Beyond the Standard Model (BSM) Studies. However constraints on new physics indicate that many of these BSM models are also fine tuned (although to a much lesser extent). To compare these BSM models it is essential that we have a reliable, quantitative measure of tuning. We review the measures of tuning used in the literature a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3320062